第 12 屆 iThome 鐵人賽

DAY 24

DevOps

Docker獸究極進化～～ Kubernetes獸系列第 24 篇

Day-24 學習 Taints 與 Tolerations

12th鐵人賽 devops kubernetes

flynncanfly

團隊404 Not Found

2020-10-09 10:45:38

3023 瀏覽

分享至

前言

在Day-23 Affinity and Anti-Affinity中我們提到了親和性，它是屬於Pod的一種屬性，它使Pod被吸引到某一類特定Node，這可能是一種偏好，亦或者是硬性需求。而Taints污點則相反，它使Node能排斥特定類型的Pod。

Tolerations容忍度則是應用在Pod上的屬性，允許(但不硬性要求)Pod調度到帶有與之匹配的Taints的Node上。

Taints與Tolerations相互配合，可以用來避免Pod被分配到不合適的Node上。每個Node上都可以應用一或多個Taints，這也意味著對於那些不能容忍這些Taints的Pod，是不會被該Node所接受的。

How to use taints ?

Step0

我們一樣先將cluster內的所有pod給清空，留下一個乾淨的測試環境。

$ kubectl get pod 
No resources found in default namespace.

Step1

我們在其中一個Node加上taint ironman，並且給予effect為NoSchedule

$ kubectl get node
NAME                                                STATUS   ROLES    AGE   VERSION
gke-my-first-cluster-1-default-pool-dddd2fae-j0k1   Ready    <none>   12d   v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-rfl8   Ready    <none>   12d   v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-tz38   Ready    <none>   12d   v1.18.6-gke.3504

$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 ironman=one:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 tainted
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 ironman=two:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 tainted
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-tz38 ironman=three:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-tz38 tainted

P.S 若想移除taint則使用

$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 ironman:NoSchedule-
node/gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 untainted

Step 2

operator

將tolerations 加到pod(deployment)中，這邊分別有兩種operator能使用:

Equal: value必須相等才匹配。
Exists: 則有該key即可，並且不需指定value。

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"

tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"

並且有兩個tips:

如果tolerations.key為空且opertaor為Exists，則表示該容忍度能容忍所有的taint。
如果effect為空，表示可與所有key為"key"的effect都匹配，意指能夠容忍所有talent匹配"key"的所有effect(NoSchedule、PreferNoSchedule、NoExecute)。

effect

在effect(在step1上給予Node的taint)中，除了上面提到的NoSchedule外，還有PreferNoSchedule以及NoExecute

NoSchedule: 若未被過濾的taint中，至少存在一個effect值為NoSchedule，則Kubernetes不會將pod分配到該節點。
PreferNoSchedule: 類似於NoSchedulea，但還是會嘗試分配到該節點。
NoExecute: 若未被過濾的taint中，至少存在一個effect值為NoSchedule，則Kubernetes不會將pod分配到該節點，並且移除該Pod，如果該Pod已經在Node上運行。

這邊我們以repository的deployment為例

ironman1.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ironman-1
  labels:
    name: ironman
    app: ironman
spec:
  minReadySeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: ironman
  replicas: 1
  template:
    metadata:
      labels:
        app: ironman
        ironman: one
    spec:
      tolerations:
        - key: "ironman"
          operator: "Equal"
          value: "two"
          effect: "NoSchedule"
      containers:
       - name: ironman
         image: ghjjhg567/ironman:latest
         imagePullPolicy: Always
         ports:
           - containerPort: 8100
         resources:
           limits:
             cpu: "1"
             memory: "2Gi"
           requests:
             cpu: 500m
             memory: 256Mi
         envFrom:
           - secretRef:
               name: ironman-config
         command: ["./docker-entrypoint.sh"]
       - name: redis
         image: redis:4.0
         imagePullPolicy: Always
         ports:
           - containerPort: 6379
       - name: nginx
         image: nginx
         imagePullPolicy: Always
         ports:
           - containerPort: 80
         volumeMounts:
           - mountPath: /etc/nginx/nginx.conf
             name: nginx-conf-volume
             subPath: nginx.conf
             readOnly: true
           - mountPath: /etc/nginx/conf.d/default.conf
             subPath: default.conf
             name: nginx-route-volume
             readOnly: true
         readinessProbe:
           httpGet:
             path: /v1/hc
             port: 80
           initialDelaySeconds: 5
           periodSeconds: 10
      volumes:
        - name: nginx-conf-volume
          configMap:
            name: nginx-config
        - name: nginx-route-volume
          configMap:
            name: nginx-route-volume

Step3

那我們來部署看看吧！

$ kubectl apply -f ironman-1.yaml
deployment.apps/ironman-1 created

$ kubectl get pod --watch
NAME                         READY   STATUS              RESTARTS   AGE
ironman-1-5d5d8cbc6c-hrfgv   0/3     ContainerCreating   0          7s

ironman-1-5d5d8cbc6c-hrfgv   2/3     Running             0          11s
ironman-1-5d5d8cbc6c-hrfgv   2/3     Running             0          16s
ironman-1-5d5d8cbc6c-hrfgv   3/3     Running             0          22s
ironman-1-5d5d8cbc6c-hrfgv   3/3     Running             0          22s

Step4

再來我們看一下到底pod會被部署到哪個node上吧

$ kubectl describe pod ironman-1-5d5d8cbc6c-hrfgv
Name:         ironman-1-5d5d8cbc6c-hrfgv
Namespace:    default
Priority:     0
Node:         gke-my-first-cluster-1-default-pool-dddd2fae-rfl8/10.140.0.2
Start Time:   Tue, 06 Oct 2020 13:21:53 +0800

那其實也如我們所料被部署在Node2:

Node1上有taint(ironman:one)，並且ironman-1 pod有label(ironman:one)，且沒有toleration(ironman:one)，所以Node1上必定不能部署ironman-1 pod
Node2與Node3上都沒有taint(ironman:one)，但是因為ironman-1 pod有toleration(ironman:two)，這剛好也是Node2的taint，所以Node2的優先序高於Node3。

When to use taints and tolerations ?

Special Node: 如果某些節點只想給特定用戶使用，你可以為這些節點添下方指令，然後再給特定用戶添加相對應的tolerations，如此一來特定用戶的pod就能被分配到Special Node與Normal Node上，但如果你只需要特定用戶pod分配到Special Node上，那就僅需要在Special Node上再加上親和性即可。

$ kubectl taint nodes nodename dedicated=groupName:NoSchedule

Nodes with Special Hardware: 可能在部分Nodes上配有GPU，因此特定需求Pod只能被部署到這些Node上，並且你不希望其他Pod佔用該Node的資源，那也可以透過taint來達成。

Taint based Evictions

前面已經提到了taints與tolerations的NoExecute effect會影響在各Nodes上運行的pod

如果Pod不能容忍effect為NoExecute的taint時，Pod會立即被驅逐。
如果Pod能容忍effect為NoExecute的taint時，但在tolerations沒指定tolerationSeconds時，

該Pod能夠在Node上一直運行著。

如果Pod能容忍effect為NoExecute的taint時，且tolerations有指定tolerationSeconds，

則Pod能夠在Node上繼續運行的tolerationSeconds時間長度。

Auto created taint by Node Controller

當某些情況發生時，Node Controller會自動地為Node添加taint

node.kubernetes.io/not-read: Node尚未準備好，相當於Node status Ready為False。
node.kubernetes.io/unreachable: Node Controller訪問不到Node，相當於Node status Ready為Unknown。
node.kubernetes.io/out-of-disk: Node的Storage耗盡。
node.kubernetes.io/memory-pressure: Node的內存面臨壓力。
node.kubernetes.io/disk-pressure: Node的Storage面臨壓力。
node.kubernetes.io/network-unavailable: Node的network無法使用。
node.kubernetes.io/unschedulable: Node無法被調度。
node.cloudprovider.kubernetes.io/uninitialized: 若透過kubectl指定了一個"外部" 雲平台驅動，它將給當前節點添加一個污點將其標誌為不可用。在cloud-controller-manager 的一個控制器初始化這個節點後，kubelet 將刪除這個污點。

How to use ?

比如，一個使用了很多本地狀態的應用程序在網絡斷開時，仍然希望停留在當前節點上運行一段較長的時間，願意等待網絡恢復以避免被驅逐。在這種情況下，Pod 的容忍度可能是下面這樣的：

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

或者是某些需要相當消耗相當大storage的服務，不希望他被部署在storage面臨壓力的節點，Pod的容忍度可以是下面這樣：

tolerations:
- key: "[node.kubernetes.io/disk-pressure](http://node.kubernetes.io/disk-pressure)"
  operator: "Exists"
  effect: "NoExecute"

DaemonSet

DaemonSet控制器自動為所有守護進程添加如下NoSchedule容忍度以防DaemonSet崩潰：

node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/out-of-disk(只適合關鍵Pod)
node.kubernetes.io/unschedulable(1.10 或更高版本)
node.kubernetes.io/network-unavailable(只適合主機網絡配)

添加上述容忍度確保了向後兼容，您也可以選擇自由向DaemonSet 添加容忍度。

後記

這章節我們介紹了taint與toleration，並且配合Day-23 Affinity and Anti-Affinity，我們可以更加隨心所欲的調度與配置Node與Pod。

這章節的最後我們有提及到DaemonSet，什麼是DaemonSet呢？為何DaemonSet會與守護進程有關呢？這些疑惑我們都會在下個篇章解說，敬請期待！

Reference

https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

Day-23 抽象的Affinity 與 Anti-Affinity

Day-25 認識DaemonSet

系列文

Docker獸究極進化～～ Kubernetes獸共 30 篇

RSS系列文訂閱系列文

76 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22201 篇

完賽人數

602 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

Docker獸 究極進化 ～～ Kubernetes獸系列 第 24 篇